Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice
نویسندگان
چکیده
This paper presents an algorithm of speech endpoint detection in noisy environments, especially those with non-stationary noise. The input signal is firstly decomposed into several sub-bands. In each sub-band, an energy sequence is tracked and analyzed separately to decide whether a temporal segment is stationary or not. An algorithm of voiced speech detection based on the harmonic structure of voice is brought forward, and it is applied in the non-stationary segment to check whether it contain speech or not. The endpoints of speech are finally determined according to the combination of energy detection and voice detection. Experiments in real noise environments show that the proposed approach is more reliable compared with some standard methods.
منابع مشابه
Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection
Voice activity detection (VAD) in real-world noise is a very challenging task. In this paper, a two-step methodology is proposed to solve the problem. First, segments with non-stationary components, including speech and dynamic noise, are located using sub-band energy sequence analysis (SESA). Secondly, voice is detected within the selected segments employing the proposed method concerning its ...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملSpeech event detection using multiband modulation energy
The need for efficient, sophisticated features for speech event detection is inherent in state of the art processing, enhancement and recognition systems. We explore ideas and techniques from non-linear speech modeling and analysis, like modulations and multiband filtering and propose new energy and spectral content features derived through filtering in multiple frequency bands and tracking dom...
متن کامل